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Abstract 

Background: Molecular markers facilitate both genotype identification, essential for modern animal and plant 
breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have 
made possible the identification of single nucleotide polymorphisms (SNPs) for any genomic regions. Here a 
sequence based polymorphic (SBP) marker technology for generating molecular markers for targeted genomic 
regions in Arabidopsis is described. 

Results: A ~3X genome coverage sequence of the Arabidopsis tholiono ecotype, Niederzenz (Nd-0) was obtained 
by applying lllumina's sequencing by synthesis (Solexa) technology. Comparison of the Nd-0 genome sequence 
with the assembled Columbia-0 (Col-0) genome sequence identified putative single nucleotide polymorphisms 
(SNPs) throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating 
from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing 
Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to 
the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small 
amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a 
gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor 
regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. 

Conclusions: The SBP marker technology described here allowed the development of molecular markers for 
targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for 
targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This 
technology will particularly facilitate the development of high density molecular marker maps, essential for cloning 
genes based on their genetic map positions and identifying tightly linked molecular markers for selecting desirable 
genotypes in animal and plant breeding experiments. 

Keywords: Niederzenz, Solexa sequencing, sequence based polymorphic marker, nonhost resistance, Phytophthora 
sojoe, SHORE analysis 



Background 

Discovery of molecular markers has facilitated mapping 
of both qualitative and quantitative traits. Tightly linked 
molecular markers facilitate (i) isolation of the genes 
encoding these traits and (ii) selection of genotypes 
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carrying the desirable alleles. Several molecular marker 
technologies such as, RFLP, RAPD, DAF, SSR, SSLP, 
AFLP, CAPS, SNP have been discovered for molecular 
mapping experiments [1-6]. Fingerprinting of genotypes 
for restriction fragment length polymorphisms (RFLPs) 
has been regarded as the most sensitive method of gen- 
otyping. This procedure, however, requires a large quan- 
tity of genomic DNA and use of radioactive probes. In 
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the random amplified polymorphic DNA (RAPD) mar- 
ker technology, multiple random loci of the genomes 
are PCR amplified with a single, 10 nucleotide long pri- 
mer of arbitrary sequence [3]. In DNA amplification fin- 
gerprinting (DAF), many loci are PCR amplified with 
the aid of a single, short arbitrary primer, as short as 5- 
nucleotides long [4]. Simple sequence repeat (SSR) mar- 
kers, also known as microsatellite markers, utilize the 
variation for tandem repeats such as (CA) n repeats 
observed between genotypes [7]. Simple sequence length 
polymorphism (SSLP) markers, similar to SSR markers, 
are designed based on a unique segment of genomic 
DNA sequence that contains a simple tandem repeat 
that distinguishes the genotypes. In Arabidopsis, SSLPs 
are largely based on the (GA) n repeats [8]. Cleaved 
amplified polymorphic sequences (CAPS) markers are 
designed based on restriction fragment length poly- 
morphisms of PCR amplified fragments, when sequence 
information of one of the haplotypes is unknown [6]. 

The high-throughput amplified fragment length poly- 
morphism (AFLP) marker technology combines princi- 
ples of RFLP and random PCR amplification for rapid 
identification of molecular loci of the entire genome [1]. 
AFLP technology is particularly suitable for developing 
high density molecular marker maps, essential for both 
map-based cloning of genes and the isolation of molecu- 
lar markers for selecting desirable genotypes in breeding 
programs. AFLP technology identifies molecular markers 
based on a fraction of the restriction fragment length 
polymorphisms between two genotypes. Restriction site 
associated DNA (RAD) marker technology, on the other 
hand, generates markers for all polymorphic sites of a 
restriction endonuclease between two genotypes; and 
thus, it is a very sensitive marker technology for devel- 
oping a high density molecular map [9]. 

Polymorphisms detected by various marker technolo- 
gies have been used to generate molecular marker maps 
of those species that do not have any genome sequences 
and physical maps. Since assembled genome sequence 
of many species are available, and the cost of sequencing 
has declined significantly with advent of the next gen- 
eration sequencing technologies, single nucleotide poly- 
morphism (SNP) is becoming the most popular 
molecular marker [10-12]. However, SNP assays are not 
always simple or flexible. Here, a strategy of using SNPs 
for rapid generation of molecular markers, termed 
sequence based polymorphic (SBP) marker technology, 
is described. 

The assembled Arabidopsis thaliana genome sequence 
is selected for this study [13]. Many of the ecotypes of 
this species are available and have been used in mapping 
experiments to conduct genetic and biological studies. 
SNPs among some of the accessions or ecotypes of this 
model plant species are available [14-15; at http://www. 



arabidopsis.org]. Niederzenz-0 (Nd-0), used for mapping 
the Phytophthora sojae susceptible (pss) mutants that 
are infected by the soybean pathogen, P. sojae (R. Sumit, 
B.B. Sahu and M.K. Bhattacharyya, unpublished), was 
selected for this study. The pss mutants were created in 
the penl-1 mutant of the ecotype, Columbia-0 (Col-0). 
To facilitate mapping of the putative PSS gene loci con- 
ferring nonhost resistance of Arabidopsis against P. 
sojae, SBP markers were developed as follows. Seventy- 
five nucleotide long sequencing reads obtained by con- 
ducting Solexa sequencing of the Nd-0 genome were 
compared to Col-0 sequences to identify the SNPs, 
which were subsequently converted to SBP markers if 
either of the ecotypes was cut by at least one restriction 
endonuclease at the SNP sites. By applying this technol- 
ogy, 21 co-dominant SBP markers were generated for 
the marker-poor regions of the Arabidopsis genome. 
This novel SBP marker technology should be applicable 
to any higher eukaryotic species with assembled genome 
sequences for rapid development of high density mole- 
cular marker maps for map-based cloning of genes or 
identification of suitable molecular markers for selection 
of desirable genotypes in breeding programs. 

Results 

Generation of a global molecular map for the 
polymorphic loci of the Arabidopsis thaliana ecotypes, 
Col-0 and Nd-0 

Arabidopsis is a nonhost for the soybean pathogen, Phy- 
tophthora sojae. Several putative P. sojae susceptible 
(pss) Arabidopsis mutants that are infected by this 
oomycete pathogen were identified (R. Sumit, B.B. Sahu 
and M.K. Bhattacharyya, unpublished). In order to map 
the putative PSS genes that confer nonhost resistance of 
Arabidopsis against the soybean pathogen, P. sojae, a 
global map of the SSLP and CAPS markers that are 
polymorphic between ecotypes, Col-0 and Nd-0 was 
generated. A group of 126 simple sequence length poly- 
morphism (SSLP) markers (http://www.arabidopsis.org) 
that mapped evenly throughout the entire genome was 
investigated for polymorphisms. Of these, 50 SSLPs 
were polymorphic between the two ecotypes. A group of 
48 cleaved amplified polymorphic sequences (CAPS) 
markers also were investigated for polymorphisms 
between the two ecotypes (Table 1). Of these, 18 were 
polymorphic between the two ecotypes. The map posi- 
tions of all 68 polymorphic SSLP and CAPS markers are 
presented in Additional file 1. Phenotypes of these mar- 
kers are presented in Additional file 2. 

Generation of SBP markers for saturating a global 
genome map in Arabidopsis thaliana 

The global genome map of SSLP and CAPS was marker 
poor in some genomic regions (Additional file 1). In 
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Table 1 List of CAPS markers polymorphic between 
Arabidopsis ecotypes Col-0 and Nd-0 



CAPS 
marker 



Restriction 
enzyme 



Primer Sequence 



1H1L-1.6 

20B4L-1.6 

40E1T7 

AF2 

B9-1.8 

CAT2 

ER 

G4711 

GPA1.4 

JM411 

LFY3 

MI342 

M555 

M59 

MBK23A 

PAI1.1 

T20D161 

T6P5-4.8 



Rsa\, Tsp509\ 

Dde\ 

Acc\ 

Dde\ 

Taq\ 

Taq\, Tsp509\ 

Dde\ 

Dde\ 

Tsp509\ 

Dde\ 

Rsa\ 

Tsp509\ 

Acc\ 

Rsa\, Tsp509\ 
Toq\ 

Toq\, Rsa\, Tsp509\ 
Toq\, Rso\, Tsp509\ 
Rsa\ 



F:CTAGAGOTGAAAGTOATG 
R:TOAGTCCTOTOTCTG 

F:CTAAGATGGGAATGTOG 
R:GAACTCATOTATGGACC 

F:GGTCCACmGATOAAGAT 
R:GCAAGCGATAGAACATAACG 



F:TCGTCG^mGmCC^mc™ 
R:CCATOAmAGGCCCCGACmC 

F:CATCTGCAACATCTOCCCAG 
R:CGTATCCGCAmCTOACTGC 

F: 

GACCAGTAAGAGATCCAGATACTGCG 
R:CACAGTCATGCGACTCAAGACTO 

F:GAGmATOTGTGCCAAGTCCCTG 
R: 

CTAATGTAGTGATCTGCGAGGTAATC 

F:CCTGTGAAAAACGACGTGCAGmC 
R:ACCAAATCTOGTGGGGCTCAGCAG 

F:ATOCTOGTCTCCATCATC 
R:GGGAmGATGAAGGAGAAC 

F:GCGAACCACTAAGAACTA 
R:CTCGACmGCCAAGGAT 

F:GACGGCGTCTAGAAGATO 
RfTAACTTATCGGGCTTCTGC 

F:GAAGTACAGCGGCTCAAAAAGAAG 
R:TOCTGCCATGTAATACCTAAGTG 

FiCCmAATOGTOTCAAATC 
R:CTCTOAA™™AGTOACTAG 

F :GTGC ATG ATA^G ATGTACGC 
R:GAATGACATGAACAOTACACC 

F:GATGA™GGCGCAAAATOAG 
RiAmCCAGCCTGGCTOAGG 

F :G ATCCTA AGGTA^G ATATG ATG 
R:GGTACAATOATCTOACTATAG 

F:CGTAmGCTGATOATGAGC 
R:ATGGmACACTOACAGAGC 

F:TGAAAGACACCTGGGATAGGC 
R:CCAACmCGGGTCGGTOC 



Restriction endonucleases used for generating individual CAPS markers are 
shown. F, forward primer; R, reverse primer. 



order to fill out some of the marker poor regions, single 
nucleotide polymorphism (SNP)-based molecular mar- 
kers were generated as follows. First, the Nd-0 genome 
was sequenced in an Illumina/Solexa genome Analyzer 
II (GAII) at the DNA facility, Iowa State University. 
Three genome equivalents of Nd-0 sequence in 75 bp 
reads then were analyzed to discover SNPs (Accession 
No. SRA048909.1) between Col-0 and Nd-0 by conduct- 
ing reference guided sequence analysis for all five chro- 
mosomes with the aid of the SHORE program [16]. 

One can also identify candidate SNPs 
(NCBI_SS#478443777 through 428555842) for targeted 
genomic regions by comparing Nd-0 query sequence 
with Col-0 sequence in batches of -20 kb (Figure 1). 



This was achieved by aligning the two sequences using 
BLAST (bl2seq) program at the NCBI website (http:// 
blast.ncbi.nlm.nih.gov/Blast.cgi). In Solexa sequencing, 
many sequencing reads could be originated from PCR 
products of a single DNA molecule (Additional file 3). 
SNPs originating from 75 bp reads of single PCR mole- 
cules are less reliable, because some of such single 
nucleotide polymorphisms may be generated from PCR- 
based mutations. This limitation was overcome by 
selecting those SNPs that originated from at least two 
staggered 75 bp sequence reads (Figure 2). Staggered 
reads are considered to originate from independent 
DNA molecules. Thus, SNPs observed in at least two 
overlapping reads with staggered ends are considered 
most likely authentic and selected for the next step. In 
parallel, to eliminate any possible SNPs originating from 
sequencing errors in the publicly available Col-0 
sequence, the SNPs containing single copy Col-0 
sequences were investigated for possible 100% nucleo- 
tide matches with (i) expressed sequence tags (ESTs) or 
(ii) genomic sequences of at least two BACs (Additional 
file 4) in GenBank. The Col-0 sequences that met one 
of these criteria were considered further for SBP marker 
development. Use of the above two criteria in selecting 
SNP-containing sequences increased the chance of SBP 
marker identification. High quality SNPs identified 
through SHORE analysis could be directly applied for 
developing SBP markers, if genomes are sequenced to 
higher depth (> 20X genome equivalents). 
In the last step of the SBP marker development, SNPs 
were converted to possible restriction endonuclease site- 
specific polymorphisms between the Col-0 and Nd-0 
haplotypes by analyzing restriction enzyme digestion 
patterns of the selected Nd-0 and Col-0 sequences using 
a suitable program (http://tools.neb.com/NEBcutter2/). 
PCR amplicons of approximately 200 nucleotides and 
that contained variations for restriction endonuclease 
sites between Col-0 and Nd-0 ecotypes were considered 
as putative SBP markers. Finally, primers for PCR ampli- 
fication were designed in such a way that one can easily 
distinguish the haplotype-specific restriction enzyme 
length polymorphisms following separation of the 
restriction enzyme digested PCR products on a 4% (w/v) 
agarose gel. Following this protocol, 21 SBP markers for 
some of the marker poor regions of the Arabidopsis 
genome were identified (Figures 3 and 4; Table 2). 

Discussion 

The use of molecular markers has gained importance in 
genetic studies particularly for map based cloning of 
genes [17]. The relatively low cost of sequencing a gen- 
ome, with the emergence of high throughput sequencing 
technology, has facilitated genome wide polymorphism 
studies [18,19]. The SBP marker technology can convert 
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Make a sequence database, gsNd-0, of 75 bp Solexa reads of the ecotype Nd-0. 



T 



And/or 



SHORE analysis of gsNd-0 with the Col-0 
genome sequence and SNPs identification. 



T 



Search sequences of the gsNd-0 database with 
~50 Kb Col-0 sequence from marker-poor region. 



Identify at least two Nd-0 75 bp staggered sequence reads containing SNPs. 



T 



Corresponding single copy Col-0 sequences should be supported by at least one 
EST sequence or two BAC clones-derived sequences. 



T 



Investigate Nd-0 and Col-0 haplotype sequences containing SNPs for possible 
restriction site polymorphisms to generate SBP markers. 



T 



Confirm SBP markers by conducting PCR, restriction digestion and gel 
electrophoresis. 



Figure 1 Steps in generating SBP markers. Putative SNPs were identified by (i) SHORE mapping of the 75 bp Nd-0 Solexa reads and Co-0 
genome sequence and/or (ii) searching SNPs by comparing batches of 50 Kb Col-0 sequences with the 75 bp Nd-0 Solexa reads. The putative 
SNPs containing Solexa reads that carried staggered ends were selected for next step. Assembled genome sequences of Col-0 carrying putative 
SNPs were searched for 100% nucleotide matches with transcript sequences or sequences from multiple BACs. The SNPs were utilized to 
develop SBP markers if they could be translated to restriction fragment length polymorphisms. 



most of the single nucleotide polymorphisms to molecu- 
lar markers for any genomic regions. SBP markers 
developed based on sequence information are ideal for 
those species, whose genomes are sequenced and 
assembled. Reference genome sequence can be utilized 
to develop SBP markers for a specific genomic region 
with known physical location. Thus, marker-poor 
regions can be enriched with SBP markers. In this 
study, the applicability of the SBP marker technology for 
generating markers is shown for improving a genetic 
map that represents polymorphisms between two Arabi- 
dopsis ecotypes, Col-0 and Nd-0 (Figure 4). SBP mar- 
kers were generated from just three genome equivalents 
Nd-0 genome sequence of 75 bp Solexa reads. The 
method also has been successfully applied in developing 
a high density molecular map of the PSS1 gene that 
confers nonhost resistance against the soybean patho- 
gens, Phytophthora sojae and Fusarium virguliforme (R. 
Sumit, B.B. Sahu and M.K. Bhattacharyya, unpublished). 

The SHORE program used in this study is highly 
powerful and has been employed successfully in identifi- 
cation of a mutation through analysis of deep sequence 



data of a bulk of 500 mutant F 2 progenies [20]. If the 
genome sequencing is not conducted to a higher depth 
(e.g. > 20 fold), SNPs identified through SHORE ana- 
lyses can be verified by conducting BLAST analyses. 
Staggered Solexa sequence reads (Figure 2) containing 
SNPs are considered for generating SBP markers for 
such a scenario. Similarly, candidate SNP containing 
regions of the reference genome should be supported by 
multiple sequences, such as transcript sequences and/or 
sequences from more than one BAC clone to avoid any 
possible sequencing errors (Additional file 4). 

If none of the haplotypes of interest are sequenced, 
then reference genome sequence should be used to 
define the SNP maps of individual haplotypes by run- 
ning the SHORE program. The SNP maps then can be 
compared to determine the SNPs between the haplo- 
types of interest. Once the candidate SNPs are identi- 
fied, small PCR amplicons of ~ 200 bp can be amplified 
and digested with suitable restriction endonuceases to 
release the restriction length polymorphisms. A signifi- 
cant proportion of the SNPs could be unusable in SBP 
marker development because they may not be digested 
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Name Chromosome Position Ref base Cons Base Read Type Support Concordance Max Qualityvg hits 



Col-NdChr5 


25034700 


C- 


C 


core 


no n rep 3 




33 




Col-NdChr5 


25037599 


C 


T 


core 


no n rep 4 


i 


33 


l 


Col-NdChr5 


25037602 


A 


G 


core 


no n rep 4 


l 


34 


l 


Col-NdChr5 


25037650 


T 




core 


no n rep 7 




33 


: 



(k) SBP5_25_1 CGGAGCTTCTAAGAAAGCCATAAACTTCATC 31 

SBP5_2 5_2 C T C AC AGAC C GGAG C T T C T AAG AAAGC C AT AAAC T T CAT C 4 0 

SBP5 25 3 CTCACGGCGGCCACCGCAGCCTCACAGACCGGAGCTTC TAAGAAAGCCATAAACTTCATC 60 



******************************* 



(c) 



SBP5_2 5_1 CAATCTTCTTGCAAAACCACCACATACCCAGCCTTATGCGTCCA 7 5 

SBP5_2 5_2 CAATCTTCTTGCATAACCACCACATACCCAGCCTT 75 

SBP5_25_3 CAATCTTCTTGCAAA 75 

************* * 



cggagctl/ccaaaa^agccataaacttcatccaatcttcttgcaaaaccaccacataccc 4 64 9 

MINIM I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
eggage t1S.pt aagaa-'agecataaaett cat ccaatcttcttgcaaaaccaccacataccc 60 

tgccttatgtgtcca 4664 

M M M M M M I 
agccttatgcgtcca 75 

ctcacagaccggagctfccaaaaaagccataaacttcatccaatcttcttgcaaaaccac 4 64 0 
I I I I I I I I I I I I I I I I ! I I I MM I I I I I I I I I I I I I I I I I I I I I I I I I I M M M 
ctcacagaccggagcttctaagaa.Agccataaacttcatccaatcttcttgcataaccac 60 

cacataccctgcctt 4655 
M M M M I M M I 
cacatacccagcctt 75 

ctcacggcggccaccgcagcctcacagaccggagcttpcaaaaaagccataaacttcatc 4 62 0 



Query : 


4590 


Sbjct : 


1 


Query : 


4650 


Sbjct: 


61 


Query : 


4581 


Sbjct: 


1 


Query : 


4641 


Sbjct: 


61 


Query : 


4561 


Sbjct: 


1 


Query : 


4621 


Sbjct: 


61 



M M M M M M M M M M M M M M M M M M I 



M M M M M M M I 



ctcacggcggccaccgcagcctcacagaccggagcttc.taagaa^gccataaacttcatc 60 

caatcttcttgcaaa 4635 
M M M M M M M I 
caatcttcttgcaaa 75 

Figure 2 Identification of SNPs for generation of SBP markers in Arabidopsis thaliana. (a) SHORE analysis of a 2.95 Kb DNA fragment of 
the lower arm of chromosome V between 25,034,700 and 25,037,650 bps resulted in four SNPs. Name, name of the project; Position, position 
within the chromosome; Ref base, nucleotide of the sequenced genome (Col-0); Cons base, Consensus base (Nd-0); Read type, part of the reads 
used for prediction were non-repetitive; Support, number of reads supporting a predicted feature; Concordance: Ratio of reads to total coverage 
of the sequenced genome. Max Quality, highest base quality supporting a prediction; Avg hits, average number of alignments of all reads 
covering this genomic position, (b) Two SNPs at positions 25,037,599 and 25,037,602 nucleotides [in bold font in (a)] were aligned in three Nd-0 
Solexa reads with staggered ends, (c) Three 75 bp Nd-0 Solexa reads were aligned with the reference genome Col-0 (Query Sequence). Two 
SNPs were circled. Note that the three reads were from three independent DNA molecules. 



with restriction endonucleases in a haplotype- or geno- 
type-specific manner. In such a case, one can apply 
derived CAPS (dCAPS) technology to improve the effi- 
ciency of SBP marker development [21]. 

Conclusions 

A new molecular marker technology, based on genome 
sequence and physical map locations, is reported for those 
species whose assembled genome sequences are available. 



The technology was applied in identifying 21 SBP markers 
for some of the marker-poor genomic regions of the Ara- 
bidopsis molecular marker map that represent polymorph- 
isms between ecotypes, Col-0 and Nd-0 (Figure 4). The 
SBP marker technology should be applicable to any geno- 
mic regions and will facilitate (i) map-based cloning genes 
as well as (ii) the development of tightly linked molecular 
markers for selecting desirable genotypes in animal and 
plant breeding experiments. 
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SBP1 0.95 SBP1 5.25 SBP1 8.08 SBP1 11.09 SBP1 18.63 SBP1 22.31 SBP1 25.92 



bp C N 


bp 


C N 


100- ™ 

- 


200- 




■ 


100- 





300- 



100 



II 



200- 



100- 




200- 



100- 




100 = 



SBP2 14.07 SBP2 16.23 



Chromosome I 

SBP3 6.6 SBP3 8.11 SBP3 10.28 SBP3 11.12 SBP3 23.46 



200 



100 




200- 



100- 





bp 

300 
200 


C N bp C N bp 
200 

^0 mm 200- 

m 100^ 


C N 


bp C N 
200- 


bp 

600- 


C N 




100 


100J 




100- 


200- 





Chromosome II 



Chromosome III 




Ease in SBP marker development and application to 
any genomic regions, and genome-wide abundance of 
SNPs make this technology suitable for mapping experi- 
ments, especially to develop high density molecular 
maps for positional gene cloning experiments, if the 
assembled genome sequence and physical maps of the 
studied species are available. Innumerable SBP markers 
can be developed rapidly for a genomic region contain- 
ing a target gene in a map-based cloning experiment. 
Co-dominant gel-based SBP markers are ideal to identify 
genetic recombination events between two loci. Such 
PCR-based markers can be used to screen a large num- 
ber of segregants to identify informative recombinants 
of the target gene region. These recombinants will then 



facilitate the development of high resolution maps of a 
large number of SBP markers, essential for cloning 
genes based on their map position. Thus, high-through- 
put deep sequencing, together with SBP markers, should 
expedite map-based cloning in higher eukaryotes. 

Methods 

Plant materials and growth conditions 

Seeds of Arabidopsis thaliana ecotypes, Col-0 and Nd-0, 
were sown on LCI soil-less mixture (Sun Gro Horticul- 
ture, Bellevue, WA) under 16 h light/ 8 h dark regime at 
21°C with approximately 60% relative humidity. The 
light intensity was maintained at 120-150 uE/m 2 /s [22]. 
Ten days after sowing, the seedlings were transplanted 
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Figure 4 The molecular map of the five Arabidopsis chromosomes showing the locations of the SBP markers. Primers for 18 CAPS and 
21 SBP markers are listed in Tables 1 and 2, respectively. Primer sequences for the 50 SSLP markers can be obtained from the TAIR database. 
CAPS markers are distinguished from SSLP markers with asterisks. 



in LCI mixture. The newly transplanted seedlings were 
covered with humidity domes for two days and there- 
after watered every fourth day. A fertilizer mixture of 
15:15:15::N:P:K (1% concentration v/v) was applied to 
the seedlings seven days after transplantation. 

DNA preparation and the whole genome sequencing 

Genomic DNA was extracted from Arabidopsis by the 
CTAB method [23]. Either young inflorescence or a 
rosette leaf was selected for DNA extraction. The Nd-0 



genome was sequenced in a Solexa, Illumina sequencing 
platform at the DNA facility, Iowa State University. The 
75 bp Solexa Nd-0 reads were saved as the gsNd data- 
base (Accession No. SRA048909.1) for further studies. 

Analysis of the raw reads from Solexa Sequencing 

The raw 75 bp Solexa reads of the gsNd database were 
analyzed by the mapping algorithms, Efficient Large 
scale Alignment of Nucleotide Databases (ELAND), 
which is built in with the Solexa sequence analysis 
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Table 2 Primers and restriction enzymes used in generating 21 SBP markers 



Ch. 
No. 



Name 



Primer sequence 



Restriction enzyme 



Amplicon 
size (bp) 



SBP1_0.95 

SBP1_5.25 

SBP1_8.08 

SBP1J1.09 

SBP1J8.63 

SBP1_22.30 

SBP1_25.92 

SBP2_14.07 

SBP2_16.23 

SBP3_6.60 

SBP3_8.1 1 

SBP3J0.28 

SBP3J1.12 

SBP3_23.46 

SBP4_0.67 

SBP4_2.91 

SBP4_5.60 

SBP4_6.51 

SBP5_8.40 

SBP5_14.60 

SBP5_25.00 



F:GTCAGGCTAGCTCATCAAGTCCTAC 
R:CCGGAmCGCCAGCTCCGTC 

F :C AC A A ACCC^C ACCTCC AT 
R:GCAGTOCCTAAAGGCTGAG 

F:AACGCAATOTCAAGCAGGT 
R:CATOAATOTOGGCAGTG 

F:AAAGTCAACCGGGAGGmC 
R:AGGCTGAGGACACGAGAAGA 

F:GCAOTGCAAAAGGAAGCTC 
R:^OTGCTGGAGAATCGTGA 

F:TACCGGTOCGGTCACTATC 
R:AATGGGAAATOGGATOGT 

F:TOTOAGAGAGCGAGATCAAA 
R:AAAAGCATCACATCATCmGG 

F:GAAGGAATOGACCAAACGA 
R:ATCTAGCTGCCCTCACTGGA 

F:CACCAmGTOCCGTAAGC 
R:TGGTCAATCCATGGTGATGT 

F: CCATCGTCCTATOTAATCCATGTO 
R: GATGCAAAATCTCCATCCTOTC 

F:CACGTATCGGCGAGTCTACA 
R:CAAATOAAATCTCAG^CGTC 

F:TCTAAAACGAACCGGGAAAA 
RiCGACAAGTAAATOAAACCAACCTG 

F:AAGACmGGTOAACTCCTGAA 
R:GGCmGGATOAGGAAAAA 

F: CGACCAAATGTCTCTGAGATGTO 
R: CACCCAAGGCGGTGTOGCGAAAG 

FiCGGTOACATGCCTCAATCC 
R:TGTGGATGAmGGGGACTC 

FiCGAGTGAOTOTGAGGmATOTG 
R:CGAGATOCmGGTATGGA 

F:AGGGAAGAATATGCGGAAGG 
R:TGmCTGTOTGGCCCA^ 

F:GGACAAGACOTGAmGAAGmG 
R:GAGGGCTCACATOGGmAATG 

F:TCGACGGTGAOTGTAGGTG 
R:CGATGCCGTCTCATAAAAGG 

F: CGCGGTOTGGTAACGTOAATG 

R: CCGAGGGAGAAGAAAGGATCAAGAAG 

F:AAATCACCAATGGCAAAACA 
R:mGCGTAGACGGAGAGTGA 



Tsp509\ 

Msp\ 

Rsa\ 

BamH\ 

Rsa\ 

Dde\ 

Nhe\ 

BtsG 

Hpy] 66W 

Tsp509\ 

Toq\ 

MboW 

Toq\ 

Toq\ 

Dde\ 

Hpy]66\\ 

Toq\ 

Tsp509\ 

Dpn\ 

Hph\ 

Dde\ 



100 
233 
188 
163 
197 
151 
102 
213 
157 
379 
150 
151 
184 
520 
171 
249 
159 

395(C), 490(N) 

232 

225 

191 



Restriction endonucleases used for generating individual SBP markers are shown. F, forward primer; R, reverse primer; C, Col-0; N, Nd-0. 



pipeline of the Illumina sequencer [24]. This program 
can match a large number of reads against a reference 
genome sequence; e.g., in this study the Arabidopsis 
Col-0 genome sequence was used as the reference gen- 
ome. In order to identify the SNPs from the entire Ara- 
bidopsis genome (NCBI_SS#478443777 through 
428555842), the 75 bp Solexa sequence reads of Nd-0 
were compared to the assembled Col-0 genome 
sequence (version TAIR10) (ftp://ftp.arabidopsis.org/ 
home/tair/Sequences/whole_chromosomes/) by running 
the SHORE program [25]. The gsNd database also was 
used for conducting the BLASTN (bl2seq) search for 



polymorphic sequences of the marker poor genomic 
regions. 

SSLP and CAPS markers polymorphic between Col-0 and 
Nd-0 

Candidate SSLP and CAPS markers available from the 
TAIR database were selected to cover the entire gen- 
ome. Sequence information of primers for SSLP markers 
were obtained from Bell and Ecker [9] and the Arabi- 
dopsis Information Resource (TAIR) database (http:// 
www.arabidopsis.org). The chromosome map tool func- 
tion available at the TAIR database (http://www. 
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arabidopsis.org/jsp/ChromosomeMap/tooljsp) was used 
to map the physical locations of the markers that 
showed polymorphisms between the two accessions. 

PCR conditions and digestion with restriction 
endonucleases 

The final DNA concentration in PCR was 20 ng/|il. The 
PCR mixtures contained 2 mM MgCl 2 (Bioline, Taun- 
ton, MA), 0.25 (iM each of forward and reverse primer, 
2 \iM dNTPs and 0.5 U Choice Taq polymerase (Den- 
ville Scientific, Inc., Metuchen, NJ). For SBP or SSLP, 
PCR was conducted at 94°C for 2 min, and then 40 
cycles of 94°C for 30 s, 50°C or 55°C for 30 s and 72°C 
for 30 s. Finally, the mixture was incubated at 72°C for 
10 min. For CAPS markers, PCR was conducted at 94°C 
for 2 min, and then five cycles of 94° C for 30 s followed 
by decreasing annealing temperatures from 55°C to 50°C 
(-l 0 C/cycle) and 72°C for 1 min. Then 35 cycles of 94°C 
for 30 s, 50°C for 30 s, and 72°C for 1 min were con- 
ducted. Finally, the reaction mixtures were incubated at 
72°C for 10 minutes. PCR was carried out in PTC- 100 
Programmable Thermal Controllers (MJ Research Inc., 
Waltham, MA). The amplified products were resolved 
on a 4% (w/v) agarose gel at 8 V/cm. Amplified CAPS 
and SBP products were digested with the respective 
restriction enzymes following manufacturer's protocols. 
The ethidium bromide stained PCR products were 
visualized by illuminating with UV light. 

Additional material 



Additional file 1: Arabidopsis molecular genome map generated 
based on SSLP and CAPS markers that are polymorphic between 
Col-0 and Nd-0 ecotypes. CAPS markers shown with asterisks. The map 
was drawn using the chromosome map tool available at TAIR. 

Additional file 2: Phenotypes of the SSLP and CAPS markers 
polymorphic between Col-0 and Nd-0 ecotypes. C, Col-0; N, Nd-0. 

Additional file 3: Two 75 bp Nd-0 Solexa reads most likely 
originated from a single DNA molecule. The two reads showed 
similarity in their identity to a specific Col-0 sequence. Most likely the 
two sequence reads were obtained from sequencing of two molecules 
generated through PCR of a single DNA molecule. 

Additional file 4: The Col-0 sequence carrying SNPs, shown in 
Figure 2 (a), showed identity to three cDNA sequences. 
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